OPERATING FREQUENCY REDUCTION 
FOR TRANSVERSAL FIR FILTER 

Field of the Invention 

[0001] The present invention relates generally to computer network communications 

5 and more particularly to methods and systems that allow analog transversal FIR filters to 
operate at ultra high frequencies. More particularly, the present invention relates to a method 
and a system that allows the use of double-edge clocking to reduce the frequency of operation 
of a transversal FIR filter whose general functionality can be used to implement a Feed 
Forward Equalizer (FFE) and a Decision Feedback Equalizer (DFE). The invention is 
10 particularly relevant for systems that operate at lOGb/s or above, where the reduction in 
operating frequency of a sub-block will result in reduced power consumption. 




Background of the Invention 
Description of Related Art 

[0002] A standard transversal FIR filter includes a set of latches, a set of respective 

1 5 multiplication elements, and a summing node. The order of the filter defines that number of 
latches contained in the data FIFO, where each latch output can be used to drive a co-efficient 
value/gain required for the FIR filter functionality to be realized. 

[0003] The latches of the transversal filter all operate using the same clock, referred 

to as the High Speed Clock, which has a period T which is equal to the Unit Interval (UI) of 
20 the serial data stream. In practical applications, the delay element is implemented using a 
Flip-Flop that samples data present on an input on a given clock edge, and holds the data 
value on an output for the duration of a clock period. 

[0004] In certain cases, it is advantageous to reduce the operating frequency of the 

clock signal for reasons of technical feasibility or power consumption optimization. It is 
25 possible to split the delay elements in the transversal filter into two groups, one of which 
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latches data on the rising edge of the clock signal, the other latching data on the falling edge 
of the clock signal. This will allow a High Speed clock signal with a period T which is 
effectively twice the duration of a UI, it also implies that the data sample is held by the delay 
element for two UI. 

5 [0005] In order to improve Bit Error Rate performance in communications systems, a 

transversal FIR filter is sometimes used in the receiver or the transmitter to correct for 
InterSymbol Interference (ISI). An FFE is commonly used in a transmitter, while a receiver 
will generally contain a DFE. 

[0006] An FFE is an extension of a standard serializer transmit block, where data bits 

10 are shifted through delay elements to be transmitted one at a time, but with a partial 
contribution from other bits contained in the delay structure. An FFE serial transmitter 
includes a set of delay elements, a set of multipliers, and a summing node. The delay 
elements all operate using the same High Speed clock signal, and shift data forward on only 
one edge (usually rising) of the clock. An FFE requires that the output of a delay element be 
15 held for no more than one UI. Thus, the period T of the High Speed clock is generally equal 
to one UI for proper functionality. 

[0007] A DFE receiver block is an extension of a standard serial bit receiver block. A 

DFE receiver block includes a sheer, a set of delay elements, a set of multipliers, and a 
summing node. The slicer and delay elements all operate using the same High Speed clock 
20 signal, and sample data on only one edge (usually rising) of the clock. A DFE requires that 
the data sample be held at the output of a delay element for not more than one UI. Thus, in 
order for a DFE receiver block to function correctly, the period T of the High Speed clock 
signal must be equal to the Unit Interval of the incoming data stream. 

[0008] In both the case of the FFE and DFE, increasing the High Speed clock period 

25 by a factor of two would cause a functional failure. Therefore, there is a need to have an 
efficient method and system that will allow a DFE to function using a double edge clocking 
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scheme, so that the frequency of operation of the transversal filter in an FFE or DFE can be 
reduced. 

SUMMARY OF INVENTION 

[0009] The present invention is a method and system for reducing the frequency of 

5 operation for a transversal Finite Impulse Response (FIR) filter. The transversal filter 
operates in such a way that it has an even and odd row of data, which are latched on rising 
and falling edges of the clock respectively. This allows the clock frequency to be reduced by 
a factor of 2, and thus allows the use of more power efficient latches. Reducing the frequency 
of operation causes the high speed latches within the transversal filter to hold the data bits 
10 twice as long as is required, and thus a circuit is required to select the appropriate data bits 
from output of the appropriate half-speed latch, and subsequently scale it to apply the co- 
efficient gain. Each of the subsystems is analog, and operates in accordance with a 
synchronous clock system 

[00010] In one particular emodiment, the present invention may be characterized as a 

15 method and system to allow a transversal filter to operate at a reduced frequency while 
maintaining the Finite Impulse Response that is required by the application. This is achieved 
by providing latches that operate at a slower sub-multiple of the high speed clock and 
multiplexing the output of the slower latches in such a way that the co-efficient multipliers 
are driven by the correct data, and for the correct duration. Advantageously, the reductions in 
20 frequency for the transversal filter result in a high-speed circuit that may have considerably 
lower power consumption than one that operates at full speed. Additionally, the multiplexing 
circuit may directly apply the discreet gain required to create a coefficient, which by 
construction will further reduce complexity, die area, and power consumption. 
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BRIEF DESCRIPTION OF THE DRA WINGS 

[00011] FIG. 1 a simplified block diagram of a point-to-point backplane 

communication; 

5 [00012] FIG. 2 a simplified block diagram of a functional architecture and internal 

constructions of an exemplary lOGb/s SerDes which is outlined in FIG.l; 

[00013] FIG. 3a a simplified block diagram of a transversal FIR filter using single 

edge clocking; 

[00014] FIG. 3b a simplified block diagram of an embodiment of a shift register 

10 matrix 331 constructed in accordance with the principles of the invention; 

[00015] FIG. 4 a simplified block diagram of the present invention; 

[00016] FIG. 5 a simplified block diagram of a transversal FIR filter stage that 

incorporates the present invention; 

[00017] FIG. 6 a simplified timing diagram for the first stage of a transversal filter 

1 5 using the present invention; 

[00018] FIG. 7 a simplified timing diagram for a generic stage of a transversal filter 

that incorporates the present invention; 

[00019] FIG. 8 a simplified block diagram of an exemplary DFE that incorporates the 

present invention; 

20 [00020] FIG. 9 a simplified block diagram of an exemplary FFE that incorporates the 

present invention; 



40226004.1 



-5- 



DET AILED DESCRIPTION OF THE INVENTION 

[00021] The present invention provides a method and a system for using a double-edge 

clocking scheme and reducing the frequency of operation for a transversal FIR filter. The 
5 invention comprises of a set of 2:1 multiplexers, whose output amplitude can be controlled 
such that it is possible to apply a gain to the selected input signal. The invention is used in 
combination with a transversal FIR filter that operates at one half the intended data rate. The 
transversal filter is comprised of 2 separate sets of analog latches, where one set is positive 
edge active and the other set is negative edge active. 

10 [00022] The present invention can be used to implement very high-speed transversal 

FIR filters where the frequency requirements may be at the very limit of some mainstream 
CMOS technologies and geometries. Since the frequency of operation for the latches may be 
reduced by a factor of 2, the overall current consumption can also be reduced, and thus an 
overall reduction in power consumption can be realized through the use of the present 

15 invention. 



[00023] In order to appreciate the advantages of the present invention, it will be 

beneficial to describe the invention in the context of an exemplary lOGb/s 
Serializer/Deserializer (SerDes). The particular implementation chosen is depicted in FIG. 1, 
which is a simplified block diagram of a single pair communication system operating in half 
20 duplex move over 2 pair differential copper backplane traces. 

[00024] In FIG. 1 the communication system is represented as a point-to-point system 

in order to simplify the explanation, and includes two main SerDes blocks 100 and 102, 
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coupled together via two pairs of differential high-speed copper traces 128a and 128b. Each 
transceiver block 100 and 102 is capable of operating at a baud rate exceeding lOGb/s in each 
direction. Each transceiver 100 and 102 has a high-speed analog interface 110 and a low- 
speed digital subsection 108. A phase matching handoff 106 guarantees timing between the 
5 digital interface and analog interface. 

[00025] In the case of data transmission, the TX parallel data is encoded in the PCS 

block 104 and is fed into the MUX 116 at an appropriately scaled lower frequency clock. 
Timing between the digital and analog interfaces is guaranteed by the handoff 106. Encoded 
data is multiplexed from a parallel format into a high-speed serial format at data rates 
10 exceeding lOGb/s by the MUX block 116. Transmit data is equalized by the TX EQ 118 and 
subsequently transmitted into the channel 128 by an impedance matched analog TX_IO block 
120. 

[00026] In the case of data reception, RX_IO 126 provides an impedance-matched 

buffer between the channel 128 and the input of the RXEQ 124. RXEQ corrects the 

15 attenuation and jitter introduced by the channel. A serial bit stream representing latch 
decisions of the equalized data is then fed into the DEMUX 122, which in turn will de- 
serialize the data into a parallel output word. The parallel data output of the DEMUX 122 
feeds the PCS block 104 through the handoff 106. The handoff also serves to retime the data 
and filter jitter introduced by the channel 128b. The PCS block decodes the parallel data 

20 stream to reproduce the original transmitted parallel data stream. 

[00027] FIG. 2 is a simplified block diagram of the functional architecture and internal 

constructions of an exemplary lOGb/s SerDes 200, such as that was described in FIG.l. The 
illustrative lOGb/s SerDes, which includes the Receive (RX) and Transmit (TX) paths 202 
and 230 respectively, will be referred to as the "SerDes". 

25 [00028] The SerDes RX path includes an incoming serial data stream 204, RXIO 126, 

High Pass Filter (HPF) 206, Summing Node 208, Decision Feedback Equalizer (DFE) 216, 
Receive PLL (RXPLL) 212, DEMUX 122, RX Handoff 224, and the RX PCS 226. The HPF 
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206 serves to pre-shape the spectral content of the signal in such a way that data bit 
transitions are accentuated, which serves to partially equalize the incoming data stream so 
that the RXPLL 212 can use it for clock recovery. The recovered clock signal on 228 is phase 
and frequency correlated with the incoming serial data stream 204 and is used by the DFE 
5 216, DEMUX 122, and the Handoff 224. The Handoff 224 serves to absorb low frequency 
jitter and guarantees that the RX_PCS 226 receives the data without any timing violations. 

[00029] Based on the signal quality criteria determined by specialized analog circuitry 

210, an adaptation algorithm 214 drives the co-efficient settings of the analog DFE 216. 
Since the DFE is based on a feedback mechanism, the perfectly equalized data stream is 

10 formed at the summing node 208, where the feedback response of the DFE 216 and the 
feedforward response of the HPF 206 are linearly added to form the totally equalized data 
stream. The DFE contains a transversal filter, which is essentially a shift register. The output 
data stream of the DFE 222 represents logical decisions made from the equalized data stream 
208. The depth of the DFE 216 governs the latency of the data stream with respect to the 

15 input of the channel. The DEMUX 122 contains multiple stages of 2:1 demultiplexers, which 
will serve to generate a lower speed parallel data bus 220 that will be processed by the 
RX_PCS. 

[00030] The SerDes TX path includes the TX_PCS 232, TX Handoff 234, MUX 116, 

TX Equalizer (TX EQ) 118, TX_PLL 246, TX_IO 120, and output data stream 250. Using a 

20 reference clock 248, the TX_PLL 246 generates jitter free high-speed clocks that will drive 
the TX Handoff 234, MUX 116, and TXJEQ 118. The MUX contains a series of 2:1 
multiplexers and is designed to transform a parallel data stream 238 into a high-speed serial 
bit stream 242. A TX_EQ 118 uses the outgoing serial bit stream 242 to generate the pre- 
shaped bit stream 250 that is launched into the data channel via the output IO buffer 120. 

25 Transmit equalization is often used to complement or enhance receiver-based equalization 
because of its ease of implementation and straightforward operation. 

[00031] The TX_EQ 118 and DFE 216 are both discrete time equalizers that require a 

clock. The minimum required frequency of operation of the filters is the data rate frequency. 



40226004.1 



-8- 



Therefore, a lOGb/s data stream would require the equalizers to be clocked with a lOGhz 
clock, assuming the circuit is active on a single rising or falling edge of said clock. 
Equivalently, the period T of the clock would need to be the same as the duration of a single 
data bit, where this duration is generally referred to as a Unit Interval (UI). For lOGb/s 
5 operation, the UI is lOOps. 

[00032] FIG. 3a shows a simplified block diagram of a transversal FIR filter using 

single edge clocking. A transversal FIR filter 300 includes shift register 302, which has delay 
elements 304 serially connected to the data signal on line 308. The delay elements are 
switched by a clock signal on line 306 at a switching rate that is equal to the bit rate. The 

10 delay x is equal to the bit-rate. This interval can also be expressed as the period T of the clock 
signal appearing on line 306. The output of each delay element 304 can then be used to drive 
a coefficient value 310 to generate the FIR filter response 314. In the context of a lOGb/s 
SerDes, generation of a lOGhz clock and design of logic that can operate correctly within 
lOOps is very challenging in contemporary CMOS technology, and there would be great 

15 advantages in performance and power consumption if the circuitry could operate at a lower 
frequency. 

[00033] Power consumption and design complexity of the FIR filter may be reduced if 

the frequency of operation of the filter is reduced by a factor Q. If the clock frequency of the 
filter 300 is reduced by a factor Q where 

20 e = 2 p , where p : {0,1,2,3...}, eq.l 

And where p is an integer. 

[00034] FIG. 3b is a simplified block diagram of an embodiment of a shift register 

matrix 331 constructed in accordance with the principles of the invention. The shift register 
matrix has a matrix of delay elements 334 the arrangement and operation of which is as 
25 follows. The data signal on line 330 is provided to Q first delay elements 334 arranged as the 
Q rows of delay elements 334 in the first or left most column of the matrix of delay elements. 
In this arrangement, the delay of each delay element is Q x t, where, again, x is equal to the 
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bit-rate. This interval can also be expressed as a multiple Q of the period T of the data clock 
signal of the data signal appearing on line 330. Thus the delay elements 334 operate at a 
reduced frequency from those in the configuration of Figure 3a for a data rate on lines 308 
and 330 that is equal. The clock signal on line 332 has a period Q x T of the reference data 
5 rate or data clock signal. Expressed another way, the clock signal on line 332 has a 
frequency that is a Q sub-multiple of the data clock signal, thus the frequency of the clock on 
line 332 is 1/Q of the frequency of the reference data rate. 

[00035] Thus in the implementation of Figure 3b, reduced power consumption is 

achieved as the frequency of operation of the filter is reduced by a factor Q from the 
10 reference data rate. 

[00036] The shift register matrix of Figure 3b is configured to shift data correctly 

without dropping any bits. This is achieved by using latches 334 that operate on Q equally 
spaced phases of the clock signal appearing on line 332, such that the data 330 is captured 
every UL Phase delay blocks 336, whose phase delay contribution is defined by the 
15 relationship Eq.2, generate the clock signaling appearing on lines 338. 

360 

(j> n = nx , where n:{0,l...Q-l} eq.2 
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[00037] The delay of the latches 334 is increased by a factor of Q, so that the overall 

timing of the system is preserved. This system has the advantage of operating using a latch 
334 that is Q times slower than the latch 304 in FIG. 3a. However, where a shift register 
matrix 331 is used to implement an FIR filter, the response of the filter would be incorrect 
5 regardless of the coefficient settings. The change in the duration of the latched data signal 
passing along line 340 will prevent a filter constructed using the shift register matrix of 
Figure 3b from generating a response at the frequency of interest. An additional circuit is 
required to allow the shift register matrix 331 to be used as part of a transversal FIR filter. 

[00038] FIG. 4 is a detailed block level diagram of a multiplexer multiplier (mux/mul) 

10 400 constructed in accordance with the principles of the invention. A multiplexer 410 with Q 
inputs 414 has an input select control 412 that selects an output to summer 408 from one of 
the inputs 414. When the mux/mul 400 is configured with a shift register matrix 331 of 
Figure 3b, the selected input line of the Q inputs 414 is multiplexed at the same frequency or 
clock rate as the serial data signal appearing on line 330. That is, the multiplexer 410 is 
15 required to continuously select from Q inputs 414 for duration of 1UL The output of the 
Multiplexer 410 accommodates a scaling factor related to a gain 404 and a polarity according 
to a sign 402. This combination of functionality is referred to as a multiplexer-multiplier 
(Mux-Mul) 400. 

[00039] Modifying the selected input value with respect to polarity and gain produces 

20 the same effect as a coefficient multiplier. If the maximum signal swing is normalized with 
the desired signal amplitude, then it is possible to scale the output of the mux-mul as a 
function of control inputs 404 and 402. By construction, this mechanism can be related 
directly to a coefficient used in an FIR filter. The coefficient value is given by: 

C(n) =D[n] x G[x : 0] x sgn(G) eq.3 

25 where C(n) 406 represents the applied response related to the nth coefficient of a given 
transversal filter, D[n] represents any one input decision data 414, G[x:o] represents the 
normalized magnitude 404 of the gain associated with the coefficient, and sgn(G) represents 
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the sign 402 of the gain which is applied. It should be noted that only the input select 412 is 
changing at the same rate as the input data stream, and this allows the Mux-Mul to simulate 
the effect of a baud-spaced transversal filter for each coefficient. 

[00040] Q Mux-Mul' s are required for every stage of a transversal filter, which means 

5 that every stage of the transversal filter will represent Q number of co-efficients. Further 
references to the Mux-Mul structure and transversal filters will be based on Q=2, even though 
it is understood that Q can be any power of two as per Eq. 1. When Q=2, the phase delay <f> 2 
336 is 180 degrees, which corresponds to the complement of the filter clock. In the 
exemplary lOGb/s SerDes, a single clock phase is used throughout the circuit to further 
10 decrease complexity. Decision logic is designed to be active on the falling edge of the clock 
in order to implement the section of the transversal filter operating on the phase delayed 
clock 338. 



[00041] FIG. 5 is a detailed block diagram of the first stage 500 of a transversal filter 

that uses the present invention 400 in the context of the exemplary lOGb/s SerDes. There is a 

15 lOGb/s serial input data stream 502, 5Ghz clock input 504, coefficient control signals 402 
and 404, coefficient outputs 501 and 503, and latched data outputs 506 and 508 that are the 
even and odd decisions of the latches 514 and 516. Rising-edge active latch 514 and falling- 
edge active latch 516 have a delay time of 200ps. The Mux-MuPs 412 have two data inputs 
510 and 512, and an input select 412. Since Q=2, there are two coefficients 501 and 503 that 

20 are generated by the stage. The even and odd decision data 506 and 508 are held for 200ps 
and will be used by the next stage in the filter. The 5Ghz clock 502 is used as a logical input 
select 412 for each Mux-Mul, where one input 510 or 512 is selected in alternating fashion 
for 1UI. 

[00042] In the case where analog latches are used to build the transversal filter, as 

25 opposed to true digital Flip-Flops, the outputs of the even latches are connected to the inputs 
of the odd latches and vice versa. This has to do with the fact that latches, unlike Flip-Flops, 
have a tracking stage that starts when the clock is low, and a regeneration stage that starts 
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when the clock is high. If a series of identical latches were cascaded to form a shift register, 
all the latches would enter either tracking or regeneration mode at the same time, which 
would cause the circuit to fail. By alternating active high and active low latches in series, a 
following latch will track what the previous latch has regenerated, and thus the data can 
5 properly move through the shift structure. The overall functionality of the transversal filter is 
not changed, and this point is specifically related to the implementation of the circuit, not the 
principal of the present invention. FIG.8a and FIG. 8b illustrate this interconnect scheme 
clearly. 

[00043] FIG. 6 is a timing diagram 600 related to the first slice of the FIR filter. The 

10 incoming data signal 504 is a serial bit stream, with a data period of 1UL The clock signal 
602 has a period of 2UI. The decision data signal 614 for the even latch 514 has a UI advance 
on the decision data signal 616 of the odd latch 516. The lowercase notation bX denotes the 
incoming serial data stream with duration 1UI, and the uppercase notation BX denotes the 
corresponding decision data with duration 2UI. The overall effect of the present invention can 
15 be seen in the coefficient outputs signals 610 and 612. There are two output signals 610 and 
612 with duration 1UI that respect Eq.3 and produce the desired coefficient response required 
for the FIR filter application. FIG. 7 is a similar timing diagram 700 that is specific to the rest 
of the slices in the FIR filter, where latches are acting on decisions made by a previous stage 
in the FIR filter. The decision signals 708 and 710 are delayed versions of the decision 
20 signals 704 and 706 from the previous stage of the filter. The data signals 708 and 710 are 
used to drive a new set of coefficient signals 712 and 714. 

[00044] The basic structure 500 can be used to build a Feed-Forward transversal FIR 

filter or a Decision Feedback FIR filter. If the mux-mul coefficient outputs are summed for 
every stage and used as a stand-alone response it is considered a Feed-Forward Equalizer 
25 (FFE) that would correspond to the TX EQ 118. If the coefficient outputs for every stage are 
summed and fed back to the input of the filter it is considered a Decision Feedback Equalizer 
(DFE) that would correspond with the DFE 216. 
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[00045] FIG. 8 outlines a DFE 124 that is configured to have six coefficients 806 

using three stages 500, and which is used in the exemplary lOGb/s SerDes Receive path 202. 
The output coefficient signals 806 are summed and fed back to the input of the first stage, 
where it is combined with the output of the HPF 206 at the summing node 208. The entire 
5 structure is synchronously clocked using the recovered 5Ghz clock 228. The data outputs 802 
and 804 from the last stage of the DFE form the data input 222 to the DEMUX 122. The 
DEMUX block 122 has one less stage due to the fact that the original serial bit stream was 
already de-multiplexed by a factor of two by the DFE, hence Data_even 802 and Data_odd 
804. 

10 [00046] FIG. 9 outlines a TX EQ 118 that is configured to have four coefficients 

using two stages 500, and which is used in the exemplary lOGb/s SerDes Transmit path 230. 
The output coefficient signals 906 are summed at 908 and fed forward into the TX IO driver 
120. The entire structure is synchronously clocked with the locally generated clock 236. The 
input data 242 from the MUX 116 requires a final stage of multiplexing. The final stage of 

15 the multiplexing is performed by the Mux-Muls within the transversal filter, where D__even 
902 and D odd 904 are multiplexed into a serial response 910. 
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